AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsJan-27-2025, 03:52:26 GMT

Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

Weaknesses: The performance improvement is incremental and needs to be further evaluated. For example, each experiment should be conducted over 5 random seeds, instead of 3 seeds, for a more accurate comparison. Besides, in only 3 out of 8 environments, shown in Figure 2, the proposed method shows clear improvement. And more baseline methods should be considered, such as SAC. So, how does the generalise SIL compare to SIL in the Montezuma's Revenge task?

generalized lower bound q-learning, neurips paper, self-imitation learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Neural Information Processing SystemsJan-27-2025, 03:52:18 GMT

Review for NeurIPS paper: Self-Imitation Learning via Generalized Lower Bound Q-learning

The author response provided satisfactory answers to the concerns of the reviewers with respect to contraction/bias tradeoff, disconnect between the experimental results and theory, and variance of the estimator. This lead one reviewer to increase their score for this paper, which already had reasonably solid scores.

generalized lower bound q-learning, neurips paper, self-imitation learning, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.76)

Neural Information Processing SystemsOct-10-2024, 23:14:51 GMT

Self-Imitation Learning via Generalized Lower Bound Q-learning

generalized lower bound q-learning, q-learning, self-imitation learning, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJan-28-2024

Harnessing Network Effect for Fake News Mitigation: Selecting Debunkers via Self-Imitation Learning

Xu, Xiaofei, Deng, Ke, Dann, Michael, Zhang, Xiuzhen

This study aims to minimize the influence of fake news on social networks by deploying debunkers to propagate true news. This is framed as a reinforcement learning problem, where, at each stage, one user is selected to propagate true news. A challenging issue is episodic reward where the "net" effect of selecting individual debunkers cannot be discerned from the interleaving information propagation on social networks, and only the collective effect from mitigation efforts can be observed. Existing Self-Imitation Learning (SIL) methods have shown promise in learning from episodic rewards, but are ill-suited to the real-world application of fake news mitigation because of their poor sample efficiency. To learn a more effective debunker selection policy for fake news mitigation, this study proposes NAGASIL - Negative sampling and state Augmented Generative Adversarial Self-Imitation Learning, which consists of two improvements geared towards fake news mitigation: learning from negative samples, and an augmented state representation to capture the "real" environment state by integrating the current observed state with the previous state-action pairs from the same campaign. Experiments on two social networks show that NAGASIL yields superior performance to standard GASIL and state-of-the-art fake news mitigation models.

debunker, mitigation, social network, (15 more...)

2402.03357

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

arXiv.org Artificial IntelligenceMay-26-2023

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

Bhagavatula, Chandra, Hwang, Jena D., Downey, Doug, Bras, Ronan Le, Lu, Ximing, Qin, Lianhui, Sakaguchi, Keisuke, Swayamdipta, Swabha, West, Peter, Choi, Yejin

Commonsense capabilities of pre-trained language models dramatically improve with scale, leading many to believe that scale is the only winning recipe. But is it? Here, we investigate an alternative that a priori seems impossible: can smaller language models (e.g., GPT-2) win over models that are orders of magnitude larger and better (e.g., GPT-3), if powered with novel commonsense distillation algorithms? The key intellectual challenge is to design a learning algorithm that achieve a competitive level of commonsense acquisition, without relying on the benefits of scale. In particular, we study generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce I2D2, a novel commonsense distillation framework that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale teacher model with two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-tomic, that is the largest and highest quality available to date.

large language model, machine learning, natural language, (23 more...)

2212.09246

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Japan > Honshū > Tōhoku (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.62)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
(2 more...)

#artificialintelligenceJan-1-2023, 16:50:12 GMT

Working with the concept of Self-Imitation Learning part1(Machine Learning)

Abstract: Imitation learning (IL) enables robots to acquire skills quickly by transferring expert knowledge, which is widely adopted in reinforcement learning (RL) to initialize exploration. However, in long-horizon motion planning tasks, a challenging problem in deploying IL and RL methods is how to generate and collect massive, broadly distributed data such that these methods can generalize effectively. In this work, we solve this problem using our proposed approach called {self-imitation learning by planning (SILP)}, where demonstration data are collected automatically by planning on the visited states from the current policy. SILP is inspired by the observation that successfully visited states in the early reinforcement learning stage are collision-free nodes in the graph-search based motion planner, so we can plan and relabel robot's own trials as demonstrations for policy learning. Due to these self-generated demonstrations, we relieve the human operator from the laborious data preparation process required by IL and RL methods in solving complex motion planning tasks.

algorithm, learning, self-imitation learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

arXiv.org Artificial IntelligenceDec-7-2022

Accelerating Self-Imitation Learning from Demonstrations via Policy Constraints and Q-Ensemble

Li, Chao

Deep reinforcement learning (DRL) provides a new way to generate robot control policy. However, the process of training control policy requires lengthy exploration, resulting in a low sample efficiency of reinforcement learning (RL) in real-world tasks. Both imitation learning (IL) and learning from demonstrations (LfD) improve the training process by using expert demonstrations, but imperfect expert demonstrations can mislead policy improvement. Offline to Online reinforcement learning requires a lot of offline data to initialize the policy, and distribution shift can easily lead to performance degradation during online fine-tuning. To solve the above problems, we propose a learning from demonstrations method named A-SILfD, which treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement. Furthermore, we prevent performance degradation due to large estimation errors in the Q-function by the ensemble Q-functions. Our experiments show that A-SILfD can significantly improve sample efficiency using a small number of different quality expert demonstrations. In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training and is not misled by imperfect expert demonstrations during training.

demonstration, machine learning, reinforcement learning, (16 more...)

2212.03562

Country:

North America > United States > Massachusetts > Middlesex County > Reading (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Education > Educational Setting > Online (0.71)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-13-2022

Learning Category-Level Generalizable Object Manipulation Policy via Generative Adversarial Self-Imitation Learning from Demonstrations

Shen, Hao, Wan, Weikang, Wang, He

Generalizable object manipulation skills are critical for intelligent and multi-functional robots to work in real-world complex scenes. Despite the recent progress in reinforcement learning, it is still very challenging to learn a generalizable manipulation policy that can handle a category of geometrically diverse articulated objects. In this work, we tackle this category-level object manipulation policy learning problem via imitation learning in a task-agnostic manner, where we assume no handcrafted dense rewards but only a terminal reward. Given this novel and challenging generalizable policy learning problem, we identify several key issues that can fail the previous imitation learning algorithms and hinder the generalization to unseen instances. We then propose several general but critical techniques, including generative adversarial self-imitation learning from demonstrations, progressive growing of discriminator, and instance-balancing for expert buffer, that accurately pinpoints and tackles these issues and can benefit category-level manipulation policy learning regardless of the tasks. Our experiments on ManiSkill benchmarks demonstrate a remarkable improvement on all tasks and our ablation studies further validate the contribution of each proposed technique.

demonstration, discriminator, expert buffer, (12 more...)

2203.02107

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

arXiv.org Artificial IntelligenceDec-7-2021

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Lin, Zichuan, Li, Junyou, Shi, Jianing, Ye, Deheng, Fu, Qiang, Yang, Wei

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.

agent, demonstration, minecraft, (14 more...)

2112.04907

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)